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ABSTRACT 

Context. Weak gravitational lensing is an ideal probe of the dark universe. In recent years, several linear methods have 
been developed to reconstruct the density distribution in the Universe in three dimensions, making use of photometric 
redshift information to determine the radial distribution of lensed sources. 

Aims. In this paper, we aim to address three key issues seen in these methods; namely, the bias in the redshifts of detected 
objects, the line of sight smearing seen in reconstructions, and the damping of the amplitude of the reconstruction 
relative to the underlying density. We also aim to detect structures at higher redshifts than have previously been 
achieved, and to improve the line of sight resolution of our reconstructions. 

Methods. We consider the problem under the framework of compressed sensing (CS). Under the assumption that the 
data are sparse or compressible in an appropriate dictionary, we construct a robust estimator and employ state-of-the- 
art convex optimisation methods to reconstruct the density contrast. For simplicity in implementation, and as a proof 
of concept of our method, we reduce the problem to one-dimension, considering the reconstruction along each line of 
sight independently. We also assume an idealised survey in which the redshifts of sources are known. 
Results. Despite the loss of information inherent in our one-dimensional implementation, we demonstrate that our 
method is able to accurately reproduce cluster haloes up to a redshift of Zd = 1.0, deeper than state-of-the-art linear 
methods. We directly compare our method with these linear methods, and demonstrate minimal radial smearing and 
redshift bias in our reconstructions, as well as a reduced damping of the reconstruction amplitude as compared to 
the linear methods. In addition, the CS framework allows us to consider an underdetermined inverse problem, thereby 
allowing us to reconstruct the density contrast at finer resolution than the input data. 

Conclusions. The CS approach allows us to recover the density distribution more accurately than current state-of-the-art 
linear methods. Specifically, it addresses three key problem areas inherent in linear methods. Moreover, we are able to 
achieve super-resolution and increased high-redshift sensitivity in our reconstructions. 

Key words. Gravitational lensing: weak, Methods: statistical, Techniques: image processing, Cosmology: observations, 
Galaxies: clusters: general, Cosmology: large scale structure of Universe 



1. Introduction 

Weak gravitational lensing has become a powerful tool to 
study the dark universe, allowing us to place constraints on 
key cosmological parameters, and offering the possibility to 
place independent constraints on the dark energy equation 
of state parameter, w (Levy & Brustein 2009; Hoekstra & 
Jain 2008; Munshi et al. 2008; Albrecht et al. 2006; Peacock 
et al. 2006; Schneider 2006; Van Waerbeke & Mellier 2003). 

Until recently, weak lensing studies considered the shear 
signal, and recovered the mass distribution, in two dimen- 
sional projection (see Schneider 2006, for a review of weak 
lensing). However, with improved data quality and wide- 
band photometry, it is now possible to recover the mass dis- 
tribution in three dimensions by using photometric redshift 
information to deproject the lensing signal along the line of 
sight (Simon et al. 2009; Massey et al. 2007a,b; Taylor et al. 
2004). 

Under the assumption of Gaussian noise, various linear 
methods have been developed to recover the 3D matter 
distribution, which rely on the construction of a pseudo- 
inverse operator to act on the data, and which include a 
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penalty function encoding the prior that is to be placed on 
the signal (VanderPlas et al. 2011; Simon et al. 2009; Castro 
et al. 2005; Taylor et al. 2004; Bacon & Taylor 2003; Hu & 
Keeton 2002). 

These methods produce promising results; however, 
they show a number of problematic artefacts. Notably, 
structures detected using these methods are strongly 
smeared along the line of sight, the detected amplitude 
of the density contrast is damped (in some cases, very 
strongly), and the detected objects are shifted along the 
line of sight relative to their true positions. Such effects 
result from the choice of method used; Simon et al. (2009) 
note that their choice of filter naturally gives rise to a biased 
solution, and VanderPlas et al. (2011) suggest that linear 
methods might be fundamentally limited in the resolution 
attainable along the line of sight as a result of the smearing 
effect seen in these methods. 

Furthermore, these methods are restricted to deal solely 
with the overdetermined inverse problem. In other words, 
the resolution obtainable on the reconstruction of the den- 
sity is limited to be, at best, equal to that of the input 
data. Thus, the resolution of the reconstruction is entirely 
limited by the quality of the data and its associated noise 
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levels, with no scope for improvement by judicious choice 
of inversion or denoising method. 

In this paper, we consider the deprojection of the lensing 
signal along the line of sight as an instance of compressed 
sensing, where the sensing operator models the line-of-sight 
integration of the matter density giving rise to the lensing 
signal. For simplicity in implementation, and as a proof of 
concept of our method, we consider only the line of sight 
transformation, reducing the 3D weak lensing problem to a 
one-dimensional inversion, with each line of sight in an im- 
age treated as independent. We note, however, that the al- 
gorithm presented here can be cheaply generalised to three 
dimensions. 

We adopt a sparse prior on our reconstruction, and use 
a state-of-the art iterative reconstruction algorithm drawn 
from convex analysis, optimisation methods and harmonic 
analysis set within a compressed sensing framework. This 
enables us to find a robust estimator of the solution without 
requiring any direct prior knowledge of the statistical dis- 
tribution of the signal. We note that the compressive sens- 
ing framework allows us to consider an underdetermined 
inverse problem, thereby allowing us to obtain higher res- 
olution on our reconstructions than that provided by the 
input data. 

This method produces reconstructions with minimal 
bias and smearing in redshift space, and with reconstruc- 
tion amplitudes ~ 75% of the true amplitude (or better, in 
some cases) . This is a significant improvement over current 
linear methods, despite the adoption of a simplified, one- 
dimensional algorithm. In addition, our method exhibits an 
apparent increased sensitivity to high redshift structures, 
as compared with linear methods. Our reconstructions do 
exhibit some noise, with false detections appearing along a 
number of lines of sight. However, these tend to be localised 
to one or two pixels, rather than coherent structures, and we 
expect improved noise control with a full three-dimensional 
implementation. 

We note that we do not include photometric redshift 
errors in our simulations and, consequently, the simulations 
shown here should be considered to be idealised. Ma et al. 
(2006) have presented a method to account for such errors 
in lensing measurements. This method has been used by 
various authors in studies on real data (see, e.g. Simon et al. 
2011), and is straightforward to implement in the algorithm 
presented here. A full treatment of photometric redshift 
errors, which is essential in order for the method to be useful 
on real data, will be presented in an upcoming work. 

This paper is structured as follows. In § 2, we out- 
line the weak lensing formalism in three dimensions, and 
outline several linear inversion methods to solve the 3D 
weak lensing problem. We introduce our compressed sens- 
ing framework and describe our proposed algorithm in § 3. 
In § 4, we discuss practical considerations in implement- 
ing the method, and describe our simulated dataset. In § 5 
we demonstrate the performance of our algorithm in recon- 
structing simulated cluster haloes at various redshift s. We 
conclude with a discussion of our results and future appli- 
cations in § 6. 

Throughout the text, we assume ACDM cosmology, 
with = 0.736, 0.264, h 0.71, ag 0.801, 

consistent with the WMAP-7 results (Larson et al. 2011). 



2. 3D Weak Lensing 

The distortion of galaxy images due to the weak lensing ef- 
fect is described, on a given source plane, by the Jacobian 
matrix of the coordinate mapping between source and im- 
age planes: 



A = 



-72 



-72 
1 - A.: + 7i 



(1) 



where K: is the projected dimensionless surface density, and 
7 = 7i + ifl2 is the complex shear. The shear is related to 
the convergence via a convolution in two dimensions: 



where 
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^ = 6^1 +i^2, and an asterisk * represents complex conjuga- 
tion. 

The convergence, in turn, can be related to the three- 
dimensional density contrast S{r) = p{r)/p — 1 by 
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where Hq is the hubble parameter, VLm is the matter den- 
sity parameter, c is the speed of light, a{w) is the scale 
parameter evaluated at comoving distance and 

ri^-V2sin(i^V2^)^ j^>0 

fK{w) = lw, K = , (5) 

[(-i^)-V2sinh([-K]V2^) K <0 

gives the comoving angular diameter distance as a func- 
tion of the comoving distance and the curvature, of the 
Universe. 

If the shear (or convergence) data is divided into A^gp 
redshift bins, and the density contrast reconstruction is di- 
vided into Nip redshift bins (where N^p is not necessarily 
equal to Mp), we can write the convergence z^*^*^ on each 
source plane as 



where 



and 
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(6) 



(7) 



(8) 



Thus, for each line of sight, equation (6) describes a 
matrix multiplication, encoding a convolution along the line 
of sight. It is the inversion of this transformation: 



i^{z) = QS{z) 



(9) 



that is the focus of this paper. We note that the inversion 
of equation (2) can be straightforwardly performed on each 
source plane in Fourier space. 
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2.1. Linear Inversion Methods 

We focus here on the methods presented in Simon et al. 
(2009) and VanderPlas et al. (2011). For a review of other 
linear methods, the reader is referred to Hu & Keeton 
(2002). 

The three dimensional lensing problem is effectively one 
of observing the density contrast convolved with the linear 
operator R, and contaminated by noise, which is assumed 
to be Gaussian. Formally, we can write 

d = Rs + £:, £ - A/'(0,a^) , (10) 

where d is the observation, s the real density and e the 
Gaussian noise. 

The general idea behind linear inversion methods is to 
find a linear operator H which acts on the data vector to 
yield a solution which minimises some functional, such as 
the variance of the residual between the estimated signal 
and the true signal, subject to some regularisation or prior- 
based constraints. 

The simplest instance of such a linear operation is an 
inverse variance filter (Aitken 1934), which weights the data 
only by the noise covariance, and places no priors on the 
signal itself: 

siv = [R^X;-^R]-^R^5]-^d , (11) 

where S = (nn''") gives the covariance matrix of the noise. 

This method proves problematic when the matrices in- 
volved are non-invertible, such as when there are degenera- 
cies inherent in the allowed solution. In order to make the 
problem invertible, some regularisation must be introduced. 
Simon et al. (2009) opt to use a Saskatoon filter (Tegmark 
1997; Tegmark et al. 1997), which combines a Wiener fil- 
ter and an inverse variance filter, with a tuning parameter 
a introduced that allows switching between the two. This 
gives rise to a minimum variance filter, expressed as: 

smv = [o^l + SR'^E-^Rj-^SR'^Xl-^d , (12) 

where S = {ss^) encodes prior information about the signal 
covariance, and 1 is the identity matrix. 

VanderPlas et al. (2011) have recently proposed a filter 
based on the singular value decomposition (SVD) of the in- 
verse variance filter of Equation (11). Under this formalism, 
we can write 

siv = VA-^U^E-^/^d , (13) 

where we have decomposed the matrix R = 5]~"'"/^R = 
UAV''', U'^'U = V'^'V = 1 and A is the square diagonal 
matrix of singular values = An. 

Their filtering consists in defining a cutoff value acut, 
and truncating the matrices to remove all singular values 
with ai < (Jcut- This effectively reduces the noise by re- 
moving the small singular values, which translate into large 
values in the inversion. 

VanderPlas et al. note that the SVD decomposition is 
computationally intensive, and while they do describe a 
method to speed up the process, it may not be practical 
to use this method on large images. 

Similar considerations must be made when using any 
of the linear methods described above, as these all involve 
matrix inversion, which is an 0{N^) process. While opti- 
misations can be found, these methods become excessively 



time- and computer-intensive when large datasets are con- 
sidered. This, in effect, limits the resolution attainable using 
these methods. 

Moreover, as discussed extensively in Simon et al. (2009) 
and VanderPlas et al. (2011), these linear methods give rise 
to a significant bias in the location of detected peaks, damp- 
ing of the peak signal and a substantial smearing of the den- 
sity along the line of sight. The Compressed Sensing (CS) 
theory, described below, allows us to address the lensing in- 
version problem under a new perspective, and we will show 
that these three aspects are significantly improved using a 
non-linear CS approach. 

3. Compressed Sensing Approach 

Linear methods are easy to use, and the variance of each 
estimator is rather direct to compute. Furthermore, these 
methods generally rely on very common tools with efficient 
implementation. However, they are not the most power- 
ful, and including non-Gaussian priors is difficult, especially 
when such priors imply non-linear terms. Obviously, using 
better-adapted priors is required for building a more robust 
estimator. In this paper, we adopt a compressed sensing ap- 
proach in order to construct an estimator that exploits the 
sparsity of the signal that we aim to reconstruct. The esti- 
mator is modelled as a optimisation problem that is solved 
using recent developments from convex analysis and split- 
ting methods. 

3.1. Compressed sensing theory 

We consider some data Yi ( i G [1, ..,m]) acquired through 
the linear system 

r = 0x , (14) 

where is an mxn matrix. Compressed Sensing (Candes & 
Tao 2006; Donoho 2006) is a sampling/compression theory 
based on the sparsity of the observed signal, which shows 
that, under certain conditions, one can exactly recover a 
/c-sparse signal (a signal for which only k pixels have values 
different from zero, out of n total pixels, where k < n) from 
m < n measurements. 

Such a recovery is possible from undersampled data only 
if the sensing matrix verifies the Restricted Isometry 
Property (RIP) (see Candes & Tao 2006, for more details). 
This property has the effect that each measurement Yi con- 
tains some information about all of the pixels of X; in other 
words, the sensing operator acts to spread the informa- 
tion contained in X across many measurements Yi. 

Under these two constraints - sparsity and a transfor- 
mation meeting the RIP criterion - a signal can be recov- 
ered exactly even if the number of measurements m is much 
smaller than the number of unknown n. This means that, 
using CS methods, we will be able to far outperform the 
well-known Shannon sampling criterion. 

The solution X of (14) is obtained by minimizing 

niin||X||i s.t. Y = SX (15) 

where the £i norm is defined by \\X\\-^ = \ Xi \. The £i 
norm is well-known to be a sparsity-promoting function; i.e. 
minimisation of the ii norm yields the most sparse solution 
to the inverse problem. Many optimisation methods have 
been proposed in recent years to minimise this equation. 
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Further details about CS and ii minimisation algorithms 
can be found in Starck et al. (2010). 

In real life, signals are generally not "strictly" sparse, 
but are compressible; i.e. we can represent the signal in a 
basis or frame (Fourier, Wavelets, Curvelets, etc.) in which 
the curve obtained by plotting the obtained coefficients, 
sorted by their decreasing absolute values, exhibits a poly- 
nomial decay. Note that most natural signals and images 
are compressible in an appropriate basis. 

We can therefore reformulate the CS equation above 
(Equation (15)) to include the data transformation matrix 



Lensing efficiency 



mm \\a\ 



s.t. Y S^a 



(16) 



where X = and a are the coefficients of the trans- 

formed solution X in which is generally referred to as 
the dictionary. Each column represents a vector (also called 
an atom)^ which ideally should be chosen to match the fea- 
tures contained in X. If $ admits a fast implicit transform 
(e.g. Fourier transform. Wavelet transform), fast algorithms 
exist to minimise Equation (16). 

One problem we face when considering CS in a given ap- 
plication is that very few matrices meet the RIP criterion. 
However, it has been shown that accurate recovery can be 
obtained as long the mutual coherence between and 

/i0,^ = maxi^/c |^0i,$/e,^|, is low (Candes & Plan 2010). 

The mutual coherence measures the degree of similarity be- 
tween the sparsifying basis and the sensing operator. Hence, 
in its relaxed definition, we consider a linear inverse prob- 
lem Y = S^X as being an instance of CS when: 

1. the problem is underdetermined, 

2. the signal is compressible in a given dictionary 

3. The mutual coherence /i©,^ is low. This will happen ev- 
ery time the matrix A = 0$ has the effect of spreading 
out the coefficients aj of the sparse signal on all mea- 
surements Yi. 

Most CS applications described in the literature are based 
on such a soft CS definition. CS was introduced for the 
first time in astronomy for data compression (Bobin et al. 
2008; Barbey et al. 2011), and a direct link between CS and 
radio-interferometric image reconstruction was recently es- 
tablished in Wiaux et al. (2009), leading to dramatic im- 
provement thanks to the sparse £i recovery (Li et al. 2011). 

3.1.1. CS and Weak Lensing 

The 3D weak lensing reconstruction problem can be seen 
to completely meet the soft-CS criteria above. Indeed, 

1. the problem is undetermined as we seek a higher resolu- 
tion than initially provided by the observations, which 
are noise-limited, 

2. the matter density is seen to be sparsely distributed, 
showing clusters connected by filaments surrounding 
large voids, 

3. the lensing operator encoded by the matrix Q in 
Equation (9) (or equivalently, the combination P^^Q, 
where Pfl- encodes the convolution in Equation (2)) 
spreads out the information about the underlying den- 
sity in a compressed sensing way. 

To highlight point (3) above. Figure 1 shows the rows 
(top panel) and columns (bottom panel) of the transfor- 
mation matrix, Q, encoding weak lensing along the line 
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Fig. 1. Top Panel: Lensing efficiency kernel for given 
source planes as a function of lens redshift. Bottom Panel: 
Lensing efficiency of a given lens plane as a function of 
source redshift. 



of sight. The top panel shows the lensing efficiency ker- 
nel, which reflects the sensitivity of a given source plane to 
the shearing effects of lenses at various redshifts. This is 
the broad convolution kernel of Equation (4). The bottom 
panel shows the effect of a discrete lens at a given redshift 
on source planes, and demonstrates that localised lenses 
give rise to effects that are non-local and affect all sources 
at redshifts greater than the lens redshift. 



3.1.2. Sparsity prior 

Sparse priors have been shown to be very useful in regular- 
ising ill-posed inverse problems (see Fadili & Starck 2009, 
and references therein). In addition, a sparse prior using a 
wavelet basis has been used in many areas of signal pro- 
cessing in astronomy, such as denoising, deconvolution and 
inpainting to recover missing data in images (Starck et al. 
2010) . The idea underlying such priors is that there exists a 
dictionary in which a given dataset is sparsely represented. 
The dictionary used should therefore match as closely as 
possible the shapes of the structures that we aim to detect. 

Many experiments, such as N-body simulations, have 
shown that the matter density in the universe is largely dis- 
tributed in localised clusters, which are connected by thin 
filaments. Because structures in the Universe appear to be 
physically sparse, we may therefore assume that the matter 
density is sparse in a domain adapted for cluster- and curve- 
like structures. Such domains exist and can be constructed 
by gathering several well chosen transforms inside a dictio- 
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nary (e.g. a combination of wavelets and curvelets). The 
dictionary should be chosen in order to match as closely 
as possible the type of structure we aim to recover (e.g. 
isotropic structures for wavelets, filaments for curvelets), 
and may be adapted to include structures that are not 
sparse in the direct domain, but which may be sparsely 
represented in an appropriate dictionary. 

Regularisation using a sparsity constraint can be under- 
stood through a Bayesian framework. We assume that the 
distribution of the solution in the sparsifying dictionary has 
a Laplacian distribution. This is equivalent to constraining 
the ii norm, which promotes sparsity. A Gaussian assump- 
tion, in contrast, constrains the £2 norm, and leads to the 
standard Wiener filtering, a Tikhonov solution or an SVD 
solution, depending on the way we constrain the solution. 

For specific classes of inverse problem, it has even been 
shown that the sparse recovery leads to the exact solution of 
the problem (compressed sensing). Such a behaviour does 
not exist with any other prior than sparsity. Obviously, the 
sparse recovery will be optimal when the signal is sparse, 
in the same way that the Wiener filter is optimal when 
the signal (and noise) is Gaussian. Because the Laplacian 
assumption (in a appropriate space such as wavelets) is 
more applicable than the Gaussian distribution, restoration 
of astronomical data is generally much more efficient using 
sparsity. 

This explains why wavelets have been so successful for 
astronomical image restoration/detection. For the recon- 
struction of clusters along the line of sight, we are in a 
perfect situation for sparse recovery since clusters are not 
resolved due to the bin size, and they can therefore be mod- 
elled as Dirac (5— functions. We therefore take ^ to be a 
(5— function dictionary. Clearly, in this case, the pixel do- 
main is especially appropriate for sparse recovery. We be- 
lieve that the sparse prior is a much better model for this 
kind of data compared to previous methods with implicit 
Gaussian assumptions. Our results in this paper appear to 
support this claim. 

We note that the method presented here is somewhat 
similar to the point-source reconstruction method described 
in Hu & Keeton (2002), though they use £2 minimisa- 
tion combined with a strict prior on the number of haloes 
present along a line of sight. Our method, in contrast, places 
no priors on the number of structures along the line of sight. 

3.2. Problem statement 

Under the CS framework, the reconstruction of the matter 
density amounts to finding the most sparse solution that 
is consistent with the data. There are many different ways 
to formulate such an optimisation problem, and we opt for 
the following: 

min s.t. \ \\d - Rsfs-. < e s G C , (17) 

where the term to minimise is a sparsity-penalty func- 
tion over the dictionary coefficients. The second term in 
Equation (17) above is a data fidelity constraint, with 
being the covariance matrix of the noise and e the allowed 
distance between the estimation and the observation, while 
the final term forces the solution to have values inside a 
given interval, usually C = [— l,+oo["^ for matter overden- 
sity. 



Note that this latter constraint, encoding a hard min- 
imum on the signal to be recovered, is not possible with 
linear methods, and is therefore an additional strength of 
our method. Enforcing such physical constraints on our so- 
lution helps to ensure the recovery of the most physically 
compelling solution given the data. 

In the compressed sensing literature, we can find equiv- 
alent writing of the problem (17), but with a focus on the 
data fidelity. We prefer to directly seek for the sparsest so- 
lution for a given freedom (i.e. e) on the distance to the 
observation, as such distance is usually easier to tune than 
an equilibrium between a regularization and the data fi- 
delity terms. 

Note this optimisation problem can be equivalently ex- 
pressed in a Bayesian framework, assuming Gaussian noise 
and a Laplacian distribution of the dictionary coefficients. 

In order to solve (17), we use the primal-dual splitting 
method of Chambolle & Pock (2011), and is described in 
full in Appendix A. The algorithm is iterative, and effec- 
tively splits the problem into two parts, applying the two 
constraints in equation (17) separately. 

On each iteration, the estimate of the reconstruction 
is compared with the data, and the data fidelity con- 
straint (the second constraint in equation (17)) is applied 
by projection of the residual onto an £2 ball of radius e. 
Independently, the estimate of the solution is projected 
onto the sparsifying basis, and sparsity is imposed through 
soft-thresholding, which minimises the £1 norm of the basis 
coefficients. The threshold level is set by the parameter A, 
which aids in controlling the noise. The noise modelling is 
discussed in section 4.3, whilst the algorithm, and all prac- 
tical considerations related to its implementation, are dis- 
cussed in detail in Appendix B. Figure 2 shows a simplified 
schematic of the algorithm used. 

4. Implementation of the Algorithm 

We first consider the simulated data to be used in the re- 
mainder of this paper, before discussing some practical con- 
siderations important when implementing the algorithm. 

4.1. Cluster Simulations 

In order to test our method, we need to simulate a realistic 
data set. To this end, we consider here a fiducial survey with 
a background galaxy number density Ug = lOOarcmin"^ 
distributed in redshift according to 

p{z) OC ^2^-(l-4^Ao)^-^ ^ (18) 

with zo = 1.0 (Taylor et al. 2007; Kitching et al. 2011), and 
the distribution truncated at a maximum redshift of z^ax = 
2. Figure 3 shows this probability distribution, normalised 
arbitrarily. 

We take the intrinsic dispersion in shear measurements 
to be (j^ = 0.2, and consider a field of 1° x 1° divided 
into a grid of 60 x 60 pixels. These parameters are chosen 
to mimic the data quality expected from next-generation 
surveys such as Euclid (Refregier et al. 2010) and LSST 
(Ivezic et al. 2008; LSST Science Collaboration et al. 2009). 

In each simulated image, one or more clusters are gen- 
erated following an NEW density profile with M200 = 
10^^ M0, c = 3 binned into N^p redshift bins. The effec- 
tive convergence and shear are computed by integrating the 
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Initialise parameters 



Split solution esti- 
mate into two parts 



Project solution estimate 
onto sparsifying basis 



Apply soft threshold at level A. 



Apply transform S~^/^Q 
to solution estimate. 



Apply data fidelity constraint: 
mean square error < e. 



Apply the inverse basis transform 



Apply transpose 
transform Q+S^/^ 



Add the two components; 
apply positivity constraint. 





y|s 

Output the solution 



Fig. 2. Simplified schematic of the reconstruction algo- 
rithm described in the text. 




Fig. 3. Redshift probability distribution of the sources 
as a function of source redshift z for the simulations de- 
scribed in the text. 



lensing signal within each source redshift bin, and Gaussian 
noise is added, scaled appropriately by the number density 
of galaxies within that bin. 

4.2. Reconstructions in ID 

As noted in § 2, the 3D weak lensing problem can be 
reduced to a one-dimensional problem, by taking as our 
data vector the (noisy) lensing convergence along each line 
of sight, which is related to the density contrast through 
Equation (9). Therefore, we take d = i^ij{z) and R = Q, 



and consider each line of sight in our images independently. 
Further, as discussed previously, we take $ to be a 5- 
function dictionary. 

In our simulations, clusters are placed into a region 
where the mean density in the absence of the cluster is 
equal to the mean density of the Universe at that redshift. 
In other words, 5 is constrained to be greater than zero in 
all our simulations. Therefore, the projection onto the con- 
vex set C in algorithm 2 applies a positivity constraint at 
each iteration. 

Clearly, a one-dimensional implementation throws away 
information, as we do not account at all for the correlation 
between neighbouring lines of sight that will arise in the 
presence of a large structure in the image; however, reduc- 
ing the problem to a single dimension is fast and easy to 
implement, and allows us to test the efficacy of the algo- 
rithm using a particularly simple basis function through 
which we impose sparsity. A fully three-dimensional treat- 
ment of the problem, with more accurate noise modelling 
(see below) will be the subject of a future work. 

However, the algorithm used is entirely general; there- 
fore, with appropriate choice of a three-dimensional basis 
set and taking d = 7(^, z) and R = Pfl-Q, one can imple- 
ment this algorithm as a fully three-dimensional treatment 
of the data with no modification to the algorithm itself. 

4.3. Noise Modelling and Control 
Noise Model for the Data 

We assume that the redshifts of the sources are known ex- 
actly, so there is no correlation between the noise in each 
source bin. Therefore, the covariance matrix of the noise 
along the line of sight is diagonal, with 



(19) 



where Apix is the pixel area, ng{zi) is the number density 
of sources in the bin at redshift Zi^ and is the intrinsic 
dispersion in galaxy ellipticity, taken throughout to be 0.2. 

This covariance matrix is used in the evaluation of the 
data fidelity constraint in our algorithm above. Note that 
the covariance matrix is only diagonal if the galaxy red- 
shifts are known exactly. In practice, photometric redshift 
errors mean that each redshift slice in the data is likely to 
be contaminated with a few galaxies whose redshift error 
bars overlap with neighbouring redshift bins. In this case, 
the covariance matrix will have additional, non-diagonal el- 
ements that are non-zero. This is straightforward to model, 
however, for the chosen method of photometric redshift es- 
timations, and our algorithm is entirely general regarding 
the form of the covariance matrix. Therefore, the problem 
of photometric redshift errors is readily tractable in our 
method, and will be presented in a future work. 



Noise Control in the Algorithm 

The noise in the reconstruction is controlled and suppressed 
by two parameters in the algorithm described in Figure 2 
and Appendix B. The first, and most important of these 
parameters is the data fidelity control parameter, e. 

This parameter controls how well the data are fit by 
the reconstruction, with e = implying a perfect fit to the 
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data, which is not possible in the presence of noise. Figure 
4 demonstrates the effect of varying e in the reconstruction 
of two hues of sight from our simulations. 
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Fig. 4. Reconstructions for two lines of sight obtained with 
varying e. The top row shows the input data and recon- 
structed convergence vector, while the bottom row shows 
the reconstructed density contrast. 



Clearly, when e is small, the algorithm attempts to fit 
each data point more closely which, in the presence of noise, 
can result in overfitting of the data (as seen in line of sight 
1) and hence false detections along the line of sight. On the 
other hand, a large e may result in a solution that is not a 
good fit to the data (as seen in line of sight 2). 

The second parameter used to control the noise is the 
soft threshold parameter A, which is used in the algorithm 
to impose the sparsity prior. A threshold set excessively 
high will result in a null solution, whilst a threshold set 
fairly low will allow for more false detections of noise peaks 
along a given line of sight. The appropriate value for this 
threshold should be related to the expected fluctuations in 
the density contrast resulting from noise variations, and 
should scale with the signal to noise in the image. 

Note that while e strongly affects the accuracy of the 
estimation in reproducing the underlying density contrast, 
A simply affects the sparsity of the solution. In other words, 
changing A will not greatly affect the reconstructions of true 
density peaks, but may affect the number of false detections 
and noise peaks seen. Also note that a thresholding A does 
not imply that density peaks with S < X will not be de- 
tected, as soft thresholding is only applied to one part of 
the estimate of the solution. 



5. Results 

5.1. Comparison with Linear Methods 

Firstly, to demonstrate the effectiveness of our method, we 
compare our method directly with the linear methods of 
Simon et al. (2009) and VanderPlas et al. (2011). As these 
linear methods are only defined for N\p < N^p, we consider 
the case of N\p = N^p = 20, and generate a single cluster 
halo at a redshift of Zd = 0.25 following an NFW halo 
profile with M200 10^^ and c = 3. 

In the SVD method of VanderPlas et al. (2011), we take 
Vcut = ^zV X]^'""" erf = 0.01, and in the transverse 

and radial Wiener filtering methods of Simon et al. (2009), 
we take the tuning parameter to be a = 0.05 in both cases. 
For our CS approach, we take the soft threshold parameter 
A = 8, and the data fidelity control parameter e = 3. 

Note that while the linear methods take d = 
7(^, z), R = Pfl-Q, our method takes d = k,{6, z), R = Q 
as before. The noise levels in each case are identical. 

The results are presented in figures 5 and 6. Figure 5 
presents the 2D projections of the reconstructions, com- 
puted by integrating the reconstruction along each line of 
sight, and the ID reconstructions along the four central 
lines of sight. In the 3D renderings of Figure 6, the re- 
constructions from our method, the SVD method and the 
radial Wiener filter method are thresholded at (5 = 3 (i.e. 
the plot only shows ^rec ^ 3), and each is smoothed with a 
Gaussian of width a = 0.7 pix in all three directions. The 
reconstruction from the transverse Wiener filter method is 
heavily damped with respect to the amplitude of the den- 
sity contrast; a threshold of ^ = 5 x 10~^ is chosen in this 
case, and no smoothing is applied as the reconstruction al- 
ready shows a very smooth distribution. 

The SVD method appears, in the ID plots, to identify 
the correct redshift of the cluster, with a small amount of 
line of sight smearing, but the plots show a prominent high- 
redshift peak along the line of sight. We note that it may be 
possible to remove this false detection by raising Vcut^ but at 
the cost of increased line of sight smearing (see VanderPlas 
et al. 2011). The two-dimensional projection consisting of 
the integrated signal along each line of sight is seen to be 
more noisy than our results. The three-dimensional render- 
ing shows that the SVD method does well at identifying 
and localising the cluster, but the resulting reconstruction 
does suffer from widespread noise at a moderate level. 

The radial and tangential Wiener filter methods show 
very little noise in the 2D projections, and the radial Wiener 
filter shows very little smearing of the reconstruction along 
the line of sight. However, neither Wiener method recovers 
correctly the redshift of the cluster. While the transverse 
Wiener filter shows very little noise, it exhibits a broad 
smearing in both the radial and transverse directions, and 
a heavy damping of the amplitude of the reconstruction. 
The results obtained using the radial Wiener filter are con- 
siderably better, with very little noise seen in the recon- 
struction, and with the cluster seen to be well-localised in 
the radial direction. However, the amplitude of the cluster 
reconstruction is a factor of ~ 2.5 smaller than the input 
density. 

Our results are seen to suffer from several prominent, 
pixel-scale false detections along noisy lines of sight not 
associated with the cluster. However, along the four cen- 
tral lines of sight an excellent correlation between the in- 
put density contrast and our reconstruction is seen. One 
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Fig. 6. 3D rendering of the reconstruction of a Zc\ = 0.25 cluster using our method (top), the SVD method (2nd row, 
^cut = 0.01), radial Wiener filter method (3rd row, a = 0.05) and the transverse Wiener filter method (bottom, a = 0.05) 
as described in the text. 
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Fig. 5. Comparison of our method with the hnear meth- 
ods as labelled. The left column shows the 2D projection 
of the reconstruction, while the right column shows the ID 
reconstructions along the four central lines of sight (dashed 
lines). Note that, due to the amplitude damping effect in 
SVD and Wiener reconstructions, the y-axis scaling is dif- 
ferent in each of the line of sight plots. 



line of sight exhibits a prominent high-redshift false detec- 
tion; however this does not appear in the remaining three 
lines of sight and the overall amplitude of the reconstruc- 
tion is around 75% of the true value. The three-dimensional 
rendering demonstrates that the noise in our reconstruc- 
tion shows very little coherent structure (i.e. tends to 
be restricted to isolated pixels), and is largely low-level. 
Moreover, the cluster is incredibly well- localised in redshift 
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Fig. 7. Reconstructions of a cluster at reshift z = 0.25 us- 
ing noisy data with rig = 30 galaxies per square arcminute. 
Shown above are the 2D projection of the reconstruction 
(top left), ID line of sight plots for the four central lines of 
sight (top right) and smoothed 3D rendering of the recon- 
struction, as before. 



space, with the smearing seen in the figure primarily arising 
from the applied smoothing. 

Reconstruction with Noisier Data 

The simulations described in § 4.1 represent a rather opti- 
mistic set of data. Realistically, one might expect a much 
smaller number density of galaxies, so it is important to 
consider how our method fares with noisier data. To this 
end, weak lensing data were simulated for the same cluster 
as described above, but with Ug = 30 galaxies per square 
arcminute. This represents a factor of ~ 1.8 reduction in 
the signal to noise of our data. In order to account for this 
reduction in signal to noise, we increased our soft threshold 
parameter A by a similar factor, taking A = 15, whist we 
again used e = 3. 

The projections and 3D reconstruction obtained using 
this noisier data is shown in Figure 7. The 2D projection 
shows a noisier reconstruction, with more false detections, 
and the central cluster appears less extended. This is to be 
expected, given the lower signal to noise in the data. The 
line of sight plots again show more noise, with the peaks 
of some lines of sight being shifted slightly from the true 
redshift. However, they are well localised around the true 
peak, and their amplitudes match quite well with the true 
underlying density. 

The 3D rendering again shows a well-localised peak at 
the location of the cluster, albeit slightly extended along 
the line of sight. Note also that the false detections con- 
tinue to appear as single pixel-scale detections, rather than 
as extended objects. As before, we believe that a fully three- 
dimensional implementation of our algorithm will reduce 
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the number of such false detections by searching for coher- 
ent structures of larger angular extent than a single pixel. 

While improvements can be made to the noise model 
used, it is clear that, with appropriate choice of param- 
eters, effective noise control can be obtained with noisy 
data, and a very clean reconstruction obtained using our 
method. Here again, our method is seen to improve on the 
bias, smearing and amplitude damping problems seen in 
the linear methods presented above, and the results at this 
noise level are of a comparable or better standard than the 
results from linear methods at higher signal to noise, as 
presented above. 

5.2. Improving the Line of Sight Resolution 

Given the success of our method at reconstructing lines of 
sight at the same resolution as the input data, it is inter- 
esting to consider whether we are able to improve on the 
output resolution of our reconstructions. It is also worth- 
while to test our ability to detect clusters at higher redshifts 
than that considered above, given that compressed sensing 
is specifically designed to tackle underdetermined inverse 
problems. Indeed, noise-free simulations suggest that a res- 
olution improvement of up to a factor of 4 in the redshift 
direction may be possible with this method. 

Therefore, we generate clusters as before, with our data 
binned into A^gp = 20 redshift bins, but aim to reconstruct 
onto Nip = 25 redshift bins. We further consider clusters 
at redshifts of = 0.2, 0.6, and 1.0. Given the changed 
reconstruction parameters, we modify our noise control pa- 
rameters slightly and, for all the results which follow, take 
A = 7.5, e = 2.8. 

Figure 8 shows the two-dimensional projection of the 
reconstruction and the ID reconstruction of the four central 
lines of sight as before. Figure 9 shows the reconstructions 
of these haloes using our method as a three-dimensional 
rendering. The three-dimensional rendering is, as before, 
thresholded at S^ec = 3 and smoothed with a Gaussian of 
width a = 0.7 pix. 

Several features are immediately apparent. Firstly, all 
three clusters are clearly identified by our method. This is 
particularly impressive in the case of the = 1.0 cluster, 
as linear methods have, thus far, been unable to recon- 
struct clusters at such a high redshift (see, e.g. Simon et al. 
2009; VanderPlas et al. 2011). We note that such a detec- 
tion is dependent on the redshift distributions of sources, 
however, and the lack of detection in Simon et al. (2009) 
and VanderPlas et al. (2011) may be due, in part, to their 
choice of probability distribution. However, we also note 
that the background source density in our sample is highly 
diminished behind the Zd = 1.0 cluster (32.5 arcmin"^). 

Again, the three-dimensional renderings indicate that 
there is very little smearing of the reconstruction along 
the line of sight, in contrast with linear methods. This is 
further evidenced by the line-of-sight plots, in which the 
unsmoothed reconstructions show very localised structure. 
Furthermore, the reconstructions exhibit minimal redshift 
bias, and some lines of sight are seen to recover the ampli- 
tude of the density contrast without any notable damping. 

However, again we see several prominent "hot pixels" or 
false detections along noisy lines of sight. Such detections 
are more evident as the cluster moves to high redshift, and 
may be significantly larger than the expected density con- 
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Fig. 8. Reconstructions of single clusters located at a red- 
shift of Zci = 0.2 (top row), Zci = 0.6 (middle row) and 
Zc\ = 1.0 (bottom row). As before, the left column shows 
the two dimensional integrated projection of the reconstruc- 
tion, while the right panel shows the input density contrast 
along the line of sight (solid line) and the ID reconstruction 
along each of the four central lines of sight (dashed lines). 



trast of the cluster. This is expected, as the number density 
of sources behind the lens diminishes. 

We note that these false detections often manifest at 
high redshift, arising out of the overfitting effect seen in 
Figure 4. We also note that the false detections are very 
well localised in both angular and redshift space, and do not 
form coherent large structures, making them easily identi- 
fiable as false detections; they tend to be localised to iso- 
lated pixels. Because of this lack of coherence, we expect 
that a fully three-dimensional implementation would sup- 
press many of these false detections by aiming to detect 
coherent structure in three dimensions, and thereby seek- 
ing structures of larger extent than a single pixel. 

5.3. More Complex Line-of-Sight Structure 

Given the ability of our method to localise structure in red- 
shift space, it is interesting to consider whether the algo- 
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rithm is able to disentangle the lensing signal from two clus- 
ters located along the same line of sight. We consider three 
different cluster pairings - z^i = [0.2,0.6], z^x = [0.2,1.0], 
and Zc\ = [0.6, 1.0] - and the results obtained by our method 
are shown in three-dimensional rendering in Figure 10. We 
find that the reconstructions of individual lines of sight in 
this case is significantly more noisy than before. 

The figure shows the z^x = [0.2,0.6] cluster pairing to 
be reconstructed as a single, coherent structure smeared 
out between the two redshifts. In the other two cases, two 
distinct clusters are observed, but their redshifts are slightly 
biased, and a moderate amount of smearing in the redshift 
direction is seen. This smearing arises from the fact that 
individual lines of sight detect the structures at different 
redshifts; the aggregate effect is an elongation along the 
line of sight. 

While these results are by no means as clean as the 
results obtained for single clusters, it is promising to note 
that we are able to detect the presence of more complex line 
of sight structure, despite the reconstruction of individual 
lines of sight being fairly noisy. It seems clear that in this 
case, a fully three-dimensional treatment of the data which 
takes into account the correlations between neighbouring 
lines of sight and which seeks to reconstruct coherent struc- 
tures would offer improvements in this area. 

6. Summary and Conclusions 

Current approaches to 3D weak lensing involve linear inver- 
sion, where a pseudo-inverse operator is constructed incor- 
porating prior constraints on the statistical distribution of 
the measurement noise and the underlying density. These 
methods are straightforward to construct and implement, 
make use of common tools, and are usually fairly fast. This 
makes them a convenient choice when approaching the 3D 
weak lensing problem. 

However, reconstructions obtained in this way suffer 
from line-of-sight smearing, bias in the detected redshift of 
structures, and a damping of the reconstruction amplitude 
relative to the input. It has further been noted (VanderPlas 
et al. 2011) that the reconstructions obtained using these 
techniques may be fundamentally limited regarding the res- 
olution attainable along the line of sight, due to smearing ef- 
fects resulting from these linear methods. In addition, such 
methods are unable to treat an underdetermined inversion, 
and therefore are limited in their output resolution by the 
resolution of the input data which, in turn, is limited by 
the measurement noise. 

We have presented a new approach to 3D weak lensing 
reconstructions by considering the weak lensing problem to 
be an instance of compressed sensing, where the underlying 
structure we aim to reconstruct is sparsely represented in 
an appropriate dictionary. Under such a framework, we are 
able to consider underdetermined transformations, thereby 
relaxing the constraints on the resolution of the reconstruc- 
tion obtained using our method. 

We have reduced the problem to that of one-dimensional 
reconstructions along the line of sight. Whilst this is clearly 
not optimal, as it throws away a lot of information, it al- 
lows us to simplify the problem and to employ a particu- 
larly simple basis through which we impose sparsity. We 
employ techniques recently developed in the area of con- 
vex optimisation to construct a robust reconstruction al- 
gorithm, and demonstrate that our method closely repro- 



duces the position, radial extent and amplitude of simu- 
lated structures, with very little bias or smearing. This is 
a significant improvement over current linear methods. In 
addition, we have shown that our method produces clean 
results that demonstrate an improvement over linear meth- 
ods, even when the signal to noise in our data is reduced 
by a factor of ~ 2 in line with that expected from current 
lensing surveys. 

Furthermore, we demonstrate an ability to reconstruct 
clusters at higher redshifts than has been attainable using 
linear methods. Although our reconstructions exhibit false 
detections resulting from the noise, these noisy peaks do not 
form coherent structures, and are therefore well-localised 
and easily identifiable as noise peaks. 

We have also tested the ability of our method to recon- 
struct multiple clusters along the line of sight. Whilst the 
reconstructions are noisy, and exhibit a stronger redshift 
bias, damping and smearing than that seen in the single 
cluster case, our method is seen to be sensitive enough to 
detect the presence of more than one structure along the 
line of sight. It is hoped that by improving our method, we 
will be able to more accurately reproduce these structures. 

It is clear from our results that this method holds a lot of 
promise; even in the one dimensional implementation pre- 
sented here, our results clearly show improvements to the 
bias, normalisation and smearing problems seen with linear 
methods. There is much room for improvement, however, 
and a fully three-dimensional implementation of the algo- 
rithm described here (which, as written, is entirely general) 
will be presented in future work. Such a treatment is ex- 
pected to reduce the incidences of false detection in our re- 
constructions, the key to this improvement being the choice 
of an appropriate three-dimensional dictionary with which 
to sparsify the solution. 

In addition, we note that the simulations presented here 
are idealised as compared to real data. Most notably, we do 
not take any account of photometric redshift errors, which 
will be present in any real dataset. This is an important 
source of error in lensing measurements, and in order for 
the method presented above to have any real value in re- 
constructing the matter distribution in a lensing survey, it 
must include treatment of these errors. This will be pre- 
sented in a future work, included in the three-dimensional 
implementation of the algorithm presented here. 

The quality of data available for weak lensing measure- 
ments continues to improve, and the methods by which we 
measure the weak lensing shear are becoming ever more so- 
phisticated. So, too, must the methods we use to analyse the 
data in order to reconstruct the underlying density. While 
linear methods appear to be limited in resolution, and offer 
biased estimators, it is clear that a nonlinear approach such 
as ours does not, and - in a fully three-dimensional imple- 
mentation - may therefore allow us to map the cosmic web 
in far greater detail than has previously been achieved. 
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Appendix A: Solving the optimization problem 

In this section, we explain the development of the recon- 
struction algorithm. First, we will present an overview 
of some key concepts and results from convex optimisa- 
tion, before introducing the primal-dual scheme chosen 
to solve Equation (17), and finally discussing the con- 
vergence criterion for the algorithm. For a complete in- 
troduction to convex analysis, the reader is referred to 
Rockafehar (1970); Lemarechal & Hiriart-Urruty (1996); 
Boyd & Vandenberghe (2004). 

Notation and terminology 

Let H a real Hilbert space; in our case, a finite dimensional 
vector subspace of W^. We denote by ||.|| the norm associ- 
ated with the inner product in / is the identity operator 
on and ||.||^ {p > 1) is the ip norm. A real- valued func- 
tion / is coercive if lim||x||^+oo /(x) = +oo, and is proper 
if its domain is non-empty: 

dom/ = {x G H I /(x) < +00} ^ . 

Lastly, To{l-L) represents the class of all proper lower semi- 
continuous (Isc) convex functions from H to (—00, +00]. 

A.l. Convex optimisation 

The theory of convex optimisation aims to solve problems 
of the form 

X G argminF(x) , (A.l) 

xen 

where F : H ^ M is a convex function. If F represents 
a potential function, for example, then (A.l) will seek for 
the state of equilibrium, i.e. the state that minimises the 
potential. 

Many algorithms exist for solving such problems, and 
an excellent introduction can be found in (Boyd & 
Vandenberghe 2004). Recently, methods have been devel- 
oped which exploit the decomposition of / into a sum of 
smaller convex functions. This can be very helpful if these 
functions show properties that are not preserved when the 
functions are summed. 

For example, if we assume that F can be decomposed 
into a sum of K functions all in ro(H), then solving (A.l) 
is equivalent to solving 

K 

argmin Vi^fe(x) . (A.2) 

k=i 

Such formulation is interesting when the F^ functions show 
properties that are lost while considering the sum F, di- 
rectly. In our case, we seek a formulation such that the 
functions we seek to minimise have proximal operators that 
are simple^ and have closed form. While the sum F may not 
have this property, the functions F^ can be chosen such that 
they satisfy this condition. 

A.2. Proximity operator 

We may now begin to describe the proximal splitting al- 
gorithm used in this work. At the heart of the splitting 
framework is the notion of the proximity operator: 



Definition 1 (Moreau (1962)). Let F G To{n). Then, 
for every x G the function y F{y) + ||x — y||2/2 
achieves its infimum at a unique point denoted by prox^ x. 
The operator prox^ : H ^ H thus defined is the proximity 
operator of F. 

Therefore, the proximity operator of the indicator func- 
tion of a convex set C: 

r if X G C , 
* \oo otherwise , 

is merely its orthogonal projector. Some key properties are 
presented in the following lemma: 

Lemma 2 (Combettes & Wajs (2005)). 

— Separability: Let Fk G ro(H), k G {l,---,i^} and 
let G : (xfc)i</c<K ^ I]/c^/c(xfc). Then ptoxq = 
(prox^Ji<fc<K. 

— Translation: Let F G To{l-L) and G G To{l-L) such 
that \Jx G H,G(x) = F(x - y), y G U. Then Vx G 

proxc' X = y + prox^(x - y). 

A. 3. Primal-dual scheme 

Consider the optimization problem: 

X G argmin G o U(x) + B{x.) , (A.3) 

where G and B are two convex, proper and lower semi- 
continuous functions and U is a linear bounded operator. 
Equation (A.3) can be see as a special case of Equation 
(A.2) where K = 2 and one of the functions contains a 
linear bounded operator. Then the Algorithm 1, proposed 
in Chambolle & Pock (2011), will converge to the solution 
of (A.3), assuming that the proximity operators of G and 
B are easy to compute or known in closed form. 



Algorithm 1: Primal-dual scheme for solving . 



Adapting the arguments of (Chambolle & Pock 
2011), convergence of the sequence (x^)^^^^ generated by 
Algorithm 1 is ensured. 

Proposition 3. Suppose that G and B are two convex, 
proper and lower semi- continuous functions. Let ( = \\U\f , 
choose r > and a such that ar( < 1, and let (x^)^^^ be 
that defined by Algorithm 1. Then, (x^)tGN converges to a 
(non-strict) global minimiser of Equation (A.3) . 



Parameters: The number of iterations A^iter, proximal 

steps cr > and r > 0. 

Initialization: 

x" = x° = $° = 0. 

Main iteration: 

For t = to A^iter - 1, 

- Dual step : ^*+^ = (/ - crproxcr/^)(lVc^ + Ux*). 

- Primal step : x*+^ = prox^^ (x* - tU*^*+^). 

- Update the coefficients estimate: x*^"*^ = 2x*^"'^ — x* 

End main iteration 

Output: Final solution x"" = x^^*^^ 
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First, we must rewrite the problem in Equation (17) in 
an unconstrained form by replacing the constraints by the 
indicator functions of the corresponding constraint sets: 



min 



1 

2d 



5]-2Rs -^ic{s) , (A.4) 



where ti^ie) is the indicatrice function of the ^2-ball of radius 
e and ic the indicatrice function of a closed convex set C. 

Notice that (A.4) can be expressed in the form of (A. 3) 
with, 



G(x,y) 
i?(x) 

U 



l|x|li + «f2(e)(y ■ 

*c(x) , 



"5d) 



(A.5) 
(A.6) 

(A.7) 



We now need only to apply Algorithm 1 in order to compute 
the solution. This requires computation of the three prox- 
imity operators, which are given by the following proposi- 
tion: 

Proposition 4. Let F e To{n). Then, 

— i/F:xh^||x||-L; then its associated proximity operator, 
piox^p, is the component-wise soft-thresholding opera- 
tor with threshold A as defined by Equation (B.l); 

-./F:x^.c(x) = |^ e/.e. 

then its associated proximity operator, prox^^ is the 
Euclidian projector to the convex set C, Prj^-; 

" *^^-^^*^^(^)(^)-\oo else, 

then its associated proximity operator, prox^^ is the pro- 
jector onto the £2 -ball with radius s defined as: 



Algorithm 2: Nonlinear iterative algorithm for solv- 
ing Equation (17). 

Parameters: Choose uj.t > such that cjtG^ < 1 where 



e ; 



+ 11^*112, where ||R||2 is the spectral 



norm of the operator R. 
Initialization: 

0, x° 



0, s° = 0, cc"" = . 
number of elements in d . 



N 
d 

Choose A > . 
Main iteration: 

For n = to Niter - 2, 

— Initialise auxiliary variables: 

— Sparsity-promoting penalty: 



Data fidelity term: 



t^/uj - d/uj if \\t^/uj ■ 



II2 



N 



e^jt^ /uj-d/u;) 



otherwise. 



\\t2 /i^—d/ujll 

W I12 

— Projection on the convex set C: 

x^+i = Prj^ 

— Update current estimate 

End main iteration 



prox^ X 



-X 



else . 



The application of Proposition 4 and Lemma 2 to the 
optimization problem in Equation (A.4), using the the 
primal-dual scheme in Algorithm 1, yields the reconstruc- 
tion algorithm expressed in Algorithm 2. 



Appendix B: Practical Considerations 

Application of the method described in Appendix A to the 
specific case of 3D lensing leads us to Algorithm 2, where 
one can note the projection over the two constraints de- 
scribed above (data fidelity and minimum value of the solu- 
tion), and the operator ^S^a, which imposes a soft threshold 
at a level of X/uj as: 



9x 



Stx 



a ^ {gx{ai))i<i<L 
V\ - A)sign(7^) |7^| > A 



(B.l) 



otherwise 



B.l. Convergence criteria 

The main difficulty with the primal-dual scheme described 
above - indeed, with any iterative algorithm - is to define 
an appropriate convergence criterion. In this case, the dif- 
ference between two successive iterates x^ and x^+^ is not 
bounded. However, the partial primal-dual gap (3pd, defined 
by: 

0pd(x,y) =max(y',Ux) - C?*(y') + B(x)- 

, , (B.2) 

min(y,Ux')-G*(y) + B(x') , 

(when solving (A. 3)) is bounded. Here, G* is the convex 
conjugate of convex function G: 



G* : X max (x, y) — G{y) . 



(B.3) 



Let us define two variables from the sequences (x^)^ and 
(C^)t produced by Algorithm 1: 

AT-l AT-l 

x- = ^5:x*andC^ = i^^*, 



t=0 



^=0 
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which are the accumulation variables at iteration — 1. 
Chambolle & Pock (2011) have shown that the sequence 

defined bv ^^ri bounded, and decreases 

V J NeN 

at a rate of (9(l/t), where t is the iteration number. 

In order to use (B.2) in our context, the two indicatrice 
functions inside (A. 4) are not considered, as they play httle 
role. Therefore, in our case, we may rewrite 0pc^(x, y) as: 

©pd(x,y) « max(y',Ux) - G*(y') - G*(y) , (B.4) 



(B.5) 

11 ■ (B.6) 

We determine the algorithm to have converged when 



G(Ux)-G*(y) , 
A||**x|| 



e5p,(x^-i,C^-i) 



< a. 



(B.7) 



where the appropriate a can be determined from simula- 
tions, and is dependent on a tradeoff between the desired 
level of accuracy of the reconstructed data and the time 
taken to complete the reconstruction. Choosing a to be 
large will result in an estimate of the solution that may be 
some distance away from the solution that would be ob- 
tained if absolute convergence were reached, but which is 
obtained in a small number of iterations. 



5.0x10^ l.OxlO^ 1.5x10^ 2.0x10' 

Number of iterations 

Fig. B.l. The function A(3^, as defined in the text, as a 
function of iteration number. 



Figure B.l shows a characteristic example of the func- 
tion as a function of iteration number N for the sim- 
ulations described in § 5. This function is clearly a smooth, 
largely steadily decreasing function of iteration number, 
and thus an appropriate choice for defining convergence. 
Note that the curve shows some oscillations with itera- 
tion number. Such oscillations arise on lines of sight where 
the noise results in the algorithm having difficulty fitting 
the data within the constraints. With appropriate choice of 
parameters, these oscillations are relatively small and the 
curve eventually becomes smooth. 

We find from experimentation that a = 10~^ yields a 
solution that is sufficiently accurate for our purposes, and 
is sufficient to largely remove the oscillations, thus we set 
this to be our threshold for all reconstructions presented 
above. In addition, we require that each line of sight under- 
goes at least 1500 iterations, to avoid misidentification of 



convergence due to early oscillations of A(3^, specifically 
a strong dip seen in this curve at the start of iteration. 

It is possible, along certain lines of sight, for the estimate 
of the solution to remain constant at zero due to the soft 
thresholding while A(S^ still varies. To account for this, if 

the current cumulative estimate does not vary for 200 
iterations, we assume that convergence has been reached 
for that line of sight. 

We now consider several practical issues involved in the 
implementation of this algorithm. 



B.2. Choice of Step Sizes r 

uj and r control the step size in the evolution of the algo- 
rithm, and are required to be positive and to satisfy the 
inequality 

ujrG^ < 1 (B.8) 



where 6 = 



|$* 



is the sum of the £2 norms 



of the operators used in the algorithm. This sum is domi- 
nated by the second term, as the elements of the Q lensing 
efficiency matrix are small. We have chosen a ^-function 
dictionary, therefore the application of the transformation 

represents a multiplication by the identity matrix. 

Thus we have O ~ ||^*||2 ^ which implies that 
ujT ~ 1 is appropriate. For all the results that follow, we 
choose u = r = 1. Smaller values of these parameters may 
be used with little effect on the resulting solution. However, 
the smaller the values chosen, the smaller the steps taken in 
each iteration of the algorithm. This results in a slower con- 
vergence than seen with larger values of uj and r. Choosing 
larger values than implied by Equation (B.8) is not advised, 
as convergence of the algorithm is not guaranteed in this 
case; the algorithm may give rise to a strongly oscillating 
estimator, and the resulting solution may not be the one 
that best fits the data given the prior constraints. 
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Fig. 9. Reconstructions of single clusters located at a redshift of z^i — 0.2 (top), z^x = 0.6 (middle) and Zc\ 
(bottom). The reconstructions are thresholded at (5 = 3 and smoothed with a Gaussian of width a = 0.7 pix. 




Fig. 10. Reconstructions of two clusters along the line of sight, located at a redshift of Zc\ = [0.2, 0.6] (top), Zc\ = [0.2, 1.0] 
(middle) and Zc\ = [0.6, 1.0] (bottom). The reconstructions are thresholded at (5 = 3 and smoothed with a Gaussian of 
width a = 0.7 pix. 



